A reconstruction attack on a private dataset $D$ takes as input some publicly accessible information about the dataset and produces a list of candidate elements of $D$. We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of $D$ from aggregate query statistics $Q(D)\in \mathbb{R}^m$, but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset $D$ was sampled, demonstrating that they are exploiting information in the aggregate statistics $Q(D)$, and not simply the overall structure of the distribution. In other words, the queries $Q(D)$ are permitting reconstruction of elements of this dataset, not the distribution from which $D$ was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.
translated by 谷歌翻译
我们提供了一种差异化私有算法,用于同时生成多个任务的合成数据:边际查询和多任务机器学习(ML)。我们算法中的一个关键创新是能够直接处理数值特征的能力,与许多相关的先验方法相反,这些方法需要首先通过{binning策略}将数值特征转换为{高基数}分类特征。为了提高准确性,需要较高的分子粒度,但这会对可伸缩性产生负面影响。消除对套在一起的需求使我们能够产生合成数据,以保留大量统计查询,例如数值特征的边际和条件线性阈值查询。保留后者意味着在特定半空间上方的每个类标记的点的比例在实际数据和合成数据中都大致相同。这是在多任务设置中训练线性分类器所需的属性。我们的算法还使我们能够为混合边缘查询提供高质量的合成数据,这些数据结合了分类和数值特征。我们的方法始终比最佳可比技术快2-5倍,并在边缘查询和混合型数据集的线性预测任务方面提供了显着的准确性改进。
translated by 谷歌翻译
我们研究依靠敏感数据(例如医疗记录)的环境的顺序决策中,研究隐私的探索。特别是,我们专注于解决在线性MDP设置中受(联合)差异隐私的约束的增强学习问题(RL),在该设置中,动态和奖励均由线性函数给出。由于Luyo等人而引起的此问题的事先工作。 (2021)实现了$ o(k^{3/5})$的依赖性的遗憾率。我们提供了一种私人算法,其遗憾率提高,最佳依赖性为$ o(\ sqrt {k})$对情节数量。我们强烈遗憾保证的关键配方是策略更新时间表中的适应性,其中仅在检测到数据足够更改时才发生更新。结果,我们的算法受益于低切换成本,并且仅执行$ o(\ log(k))$更新,这大大降低了隐私噪声的量。最后,在最普遍的隐私制度中,隐私参数$ \ epsilon $是一个常数,我们的算法会造成可忽略不计的隐私成本 - 与现有的非私人遗憾界限相比,由于隐私而引起的额外遗憾在低阶中出现了术语。
translated by 谷歌翻译
我们研究私有综合数据生成查询版本,其中目标是构建差异隐私的敏感数据集的消毒版本,这大致保留了大量统计查询的答案。我们首先介绍一个算法框架,统一文献中的长线迭代算法。在此框架下,我们提出了两种新方法。第一种方法,私人熵投影(PEP),可以被视为MWEM的高级变体,可自适应地重复使用过去查询测量以提高精度。我们的第二种方法,具有指数机制(GEM)的生成网络,通过优化由神经网络参数化的生成模型来避免MWEM和PEP等算法中的计算瓶颈,该分布族捕获了丰富的分布系列,同时实现了快速的基于梯度的优化。我们展示了PEP和GEM经验胜过现有算法。此外,我们表明宝石很好地纳入了公共数据的先前信息,同时克服了PMW ^ PUB的限制,现有的现有方法也利用公共数据。
translated by 谷歌翻译
Background: Image analysis applications in digital pathology include various methods for segmenting regions of interest. Their identification is one of the most complex steps, and therefore of great interest for the study of robust methods that do not necessarily rely on a machine learning (ML) approach. Method: A fully automatic and optimized segmentation process for different datasets is a prerequisite for classifying and diagnosing Indirect ImmunoFluorescence (IIF) raw data. This study describes a deterministic computational neuroscience approach for identifying cells and nuclei. It is far from the conventional neural network approach, but it is equivalent to their quantitative and qualitative performance, and it is also solid to adversative noise. The method is robust, based on formally correct functions, and does not suffer from tuning on specific data sets. Results: This work demonstrates the robustness of the method against the variability of parameters, such as image size, mode, and signal-to-noise ratio. We validated the method on two datasets (Neuroblastoma and NucleusSegData) using images annotated by independent medical doctors. Conclusions: The definition of deterministic and formally correct methods, from a functional to a structural point of view, guarantees the achievement of optimized and functionally correct results. The excellent performance of our deterministic method (NeuronalAlg) to segment cells and nuclei from fluorescence images was measured with quantitative indicators and compared with those achieved by three published ML approaches.
translated by 谷歌翻译
The broad usage of mobile devices nowadays, the sensitiveness of the information contained in them, and the shortcomings of current mobile user authentication methods are calling for novel, secure, and unobtrusive solutions to verify the users' identity. In this article, we propose TypeFormer, a novel Transformer architecture to model free-text keystroke dynamics performed on mobile devices for the purpose of user authentication. The proposed model consists in Temporal and Channel Modules enclosing two Long Short-Term Memory (LSTM) recurrent layers, Gaussian Range Encoding (GRE), a multi-head Self-Attention mechanism, and a Block-Recurrent structure. Experimenting on one of the largest public databases to date, the Aalto mobile keystroke database, TypeFormer outperforms current state-of-the-art systems achieving Equal Error Rate (EER) values of 3.25% using only 5 enrolment sessions of 50 keystrokes each. In such way, we contribute to reducing the traditional performance gap of the challenging mobile free-text scenario with respect to its desktop and fixed-text counterparts. Additionally, we analyse the behaviour of the model with different experimental configurations such as the length of the keystroke sequences and the amount of enrolment sessions, showing margin for improvement with more enrolment data. Finally, a cross-database evaluation is carried out, demonstrating the robustness of the features extracted by TypeFormer in comparison with existing approaches.
translated by 谷歌翻译
Digital media have enabled the access to unprecedented literary knowledge. Authors, readers, and scholars are now able to discover and share an increasing amount of information about books and their authors. Notwithstanding, digital archives are still unbalanced: writers from non-Western countries are less represented, and such a condition leads to the perpetration of old forms of discrimination. In this paper, we present the Under-Represented Writers Knowledge Graph (URW-KG), a resource designed to explore and possibly amend this lack of representation by gathering and mapping information about works and authors from Wikidata and three other sources: Open Library, Goodreads, and Google Books. The experiments based on KG embeddings showed that the integrated information encoded in the graph allows scholars and users to be more easily exposed to non-Western literary works and authors with respect to Wikidata alone. This opens to the development of fairer and effective tools for author discovery and exploration.
translated by 谷歌翻译
Content-Controllable Summarization generates summaries focused on the given controlling signals. Due to the lack of large-scale training corpora for the task, we propose a plug-and-play module RelAttn to adapt any general summarizers to the content-controllable summarization task. RelAttn first identifies the relevant content in the source documents, and then makes the model attend to the right context by directly steering the attention weight. We further apply an unsupervised online adaptive parameter searching algorithm to determine the degree of control in the zero-shot setting, while such parameters are learned in the few-shot setting. By applying the module to three backbone summarization models, experiments show that our method effectively improves all the summarizers, and outperforms the prefix-based method and a widely used plug-and-play model in both zero- and few-shot settings. Tellingly, more benefit is observed in the scenarios when more control is needed.
translated by 谷歌翻译
Anticipating future actions based on video observations is an important task in video understanding, which would be useful for some precautionary systems that require response time to react before an event occurs. Since the input in action anticipation is only pre-action frames, models do not have enough information about the target action; moreover, similar pre-action frames may lead to different futures. Consequently, any solution using existing action recognition models can only be suboptimal. Recently, researchers have proposed using a longer video context to remedy the insufficient information in pre-action intervals, as well as the self-attention to query past relevant moments to address the anticipation problem. However, the indirect use of video input features as the query might be inefficient, as it only serves as the proxy to the anticipation goal. To this end, we propose an inductive attention model, which transparently uses prior prediction as the query to derive the anticipation result by induction from past experience. Our method naturally considers the uncertainty of multiple futures via the many-to-many association. On the large-scale egocentric video datasets, our model not only shows consistently better performance than state of the art using the same backbone, and is competitive to the methods that employ a stronger backbone, but also superior efficiency in less model parameters.
translated by 谷歌翻译
The proliferation of radical online communities and their violent offshoots has sparked great societal concern. However, the current practice of banning such communities from mainstream platforms has unintended consequences: (I) the further radicalization of their members in fringe platforms where they migrate; and (ii) the spillover of harmful content from fringe back onto mainstream platforms. Here, in a large observational study on two banned subreddits, r/The\_Donald and r/fatpeoplehate, we examine how factors associated with the RECRO radicalization framework relate to users' migration decisions. Specifically, we quantify how these factors affect users' decisions to post on fringe platforms and, for those who do, whether they continue posting on the mainstream platform. Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform. Whereas social-level factors, users' connection with the radical community, only affect the propensity to be coactive on both platforms. Overall, our findings pave the way for evidence-based moderation policies, as the decisions to migrate and remain coactive amplify unintended consequences of community bans.
translated by 谷歌翻译